Array Operation Synthesis to Optimize HPF Programs
نویسندگان
چکیده
An increasing number of programming languages, such as Fortran 90, HPF, and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. The synthesis of consecutive array operations or array expressions into a composite access function of the source arrays at compile time has been shownn2] that it can reduce the redundant data movement, temporary storage usage, and loop synchronization overhead on at shared memory parallel machines with uniform memory accesses. However, it remains open how the synthesis scheme can be incorporated into optimizing HPF programs on distributed memory parallel machines by taking into account communication costs. In this paper, we propose solutions to address this open problem. We rst apply the array synthesis scheme (developed earlier by us for Fortran 90 programs) to HPF programs and demonstrate its performance beneets on distributed memory machines. In addition, to prevent a situation we call \synthesis performance anomaly", we derive a cost model and present an optimal solution based on the cost model to guide the array synthesis process on distributed memory machines. We also show that the optimal problem is NP-hard. Therefore, we develop a practical heuristic algorithm for compilers to devise synthesis strategy on distributed memory machines with HPF programs. Experimental results show signiicant performance improvement over the base codes for HPF code fragments from real applications on a DEC Farm by incorporating our proposed optimizations. The array operation synthesis scheme is demonstrated to be equally eeective for programs running on parallel distributed memory machines.
منابع مشابه
Integrating Automatic Data Alignment and Array Operation Synthesis to Optimize Data Parallel Programs
Both automatic data alignment and array operation synthesis have been shown to be very important and eeective schemes to optimize data parallel programs. However, they were considered separately so far by the research community. In this paper, we address the issue how to integrate the array operation synthesis scheme into the automatic alignment process. We propose a new array alignment concept...
متن کاملAn Expression-Rewriting Framework to Generate Communication Sets for HPF Programs with Block-Cyclic Distribution
In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of “Block-Cyclic” distributions with two-level ...
متن کاملAn Expression-Rewriting Framework to Generic Communication Sets for HPF Programs with Block-Cyclic Distribution
In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of \Block-Cyclic" distributions with two-level ...
متن کاملCommunication set generations with CSD calculus and expression-rewriting framework
In this paper, we present a new framework based on expression rewritings and a calculus form called CSD calculus to generate the local enumeration set and communication set for HPF programs with Block-Cyclic distribution. Our framework is a practical software framework, and can handle the general cases so that the communication set of HPF programs of \Block-Cyclic" distributions with two-level ...
متن کاملRuntime Array Redistribution in HPF Programs
This paper describes eecient algorithms for run-time array redistribution in HPF programs. We consider block(m) to cyclic, cyclic to block(m) and the general cyclic(x) to cyclic(y) type redistributions. We initially describe algorithms for one-dimensional arrays and then extend the methodology to multidimen-sional arrays. The algorithms are practical enough to be easily implemented in the runti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996